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ABSTRACT 


This  paper  reviews  the  application  of  the  EM  Algorithm  to  marginal 
maximum  likelihood  estimation  of  parameters  in  the  latent  class  model  and 
extends  the  algorithm  to  the  case  where  there  are  monotone  homogeneity 
constraints  on  the  item  parameters.  A  likelihood  ratio  test  of  the 
hypothesis  of  monotone  homogeneity  is  proposed.  The  hypothesis  is  of 
interest  because  all  standard  item  response  theory  models  assume  that  it 


holds. 


INTRODUCTION 


The  purpose  of  this  paper  is  twofold:  first,  to  review  the 
application  of  the  EM  algorithm  of  Dempster,  Laird,  and  Rubin  (1977)  to 
marginal  maximum  likelihood  estimation  of  parameters  in  the  latent  class 
model,  and  second,  to  extend  this  algorithm  to  the  case  where  there  are 
monotone  homogeneity  constraints  on  the  item  parameters. 

Let  us  briefly  review  the  elements  of  latent  class  models.  The 
reader  desiring  a  thorough  introduction  can  consult  Lazarsfeld  and  Henry 
(1968).  The  data  to  be  accounted  for  are  vectors  of  responses  to  items. 

In  this  paper  we  are  only  concerned  with  dichotomous  item  responses, 
although  many  of  the  ideas  can  be  generalized  to  the  case  of  polychotomous 
responses  (cf.  Goodman,  1974).  It  is  assumed  that  every  subject  belongs 
to  exactly  one  of  a  finite  set  of  mutually  exclusive  and  exhaustive  latent 
classes.  Theoretically,  the  distribution  of  the  response  vectors  is  to  be 
accounted  for  by  two  sets  of  parameters  and  one  key  assumption.  The  two 
sets  of  parameters  are  the  state  probabi 1 ities, ( vk) ,  governing  the 
multinomial  distribution  of  subjects  over  the  latent  classes,  and  the 
conditional  probabilities  of  correct  response  to  each  item,  given  the 
respective  states,  (p^j).  The  key  assumption  is  that  the  responses  are 
conditionally  independent,  given  the  state  of  subject.  This  implies  that 
any  relationships  between  items  must  be  explained  in  terms  of  differences 
in  the  Pkj's  between  classes.  Models  are  specified  by  stipulating  the 
number  of  classes  and  by  placing  constraints  on  the  matrix  of  conditional 
probabilities. 
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For  some  time  the  problem  of  estimating  parameters  in  latent  class 
structures  presented  a  real  obstacle  in  the  application  of  the  latent 
class  framework.  It  is  necessary  to  employ  iterative  procedures  in  which 
one  selects  a  set  of  trial  values,  improves  upon  these  values  in  the  light 
of  the  data  via  some  appropriate  algorithm,  and  then  repeats  the  process 
until  (one  hopes)  the  values  stabilize  at  a  good  solution.  McHugh  (1956) 
derived  the  maximum  likelihood  estimators,  but  his  solution  applies  only 
to  the  unconstrained  model. 

A  great  advance  was  achieved  when  Goodman  (1974)  described  a 
particularly  simple  interative  procedure  vrfiich  also  has  the  virtue  of 
automatically  producing  estimates  of  probabilities  which  fall  in  the  unit 
interval;  furthermore  it  is  very  easy  to  modify  the  procedure  to  satisfy  a 
fair  variety  of  other  constraints  on  the  parameters.  There  is  one  problem 
which  Goodman's  procedure  shares  with  McHugh's,  however.  Both  procedures 
take  as  their  data  the  frequency  counts  in  the  cells  of  the  multi-way 
item-by-item  — by-item  contingency  table.  For  a  relatively  small  number 
of  items  this  presents  no  problem.  But  the  number  of  cells  grows 
exponentially  with  the  number  of  items,  so  calculations  which  require 
dealing  with  all  these  cells  become  impractical  very  quickly  as  the  number 
of  items  is  increased.  For  example,  the  contingency  table  for  data  from  a 
20  item  test  would  have  220  cells,  which  is  more  than  a  million. 
Fortunately,  it  is  possible  to  formulate  an  algorithm  which  is  logically 
equivalent  to  Goodman's,  but  which  circumvents  the  problem  of  dealing  with 
all  possible  cells  in  the  n-way  contingency  table. 


Goodman's  algorithm  and  the  modification  of  it  to  be  presented  here 
are  just  special  cases  of  the  EM  algorithm  applied  to  the  latent  class 
model.  It  will  be  useful  to  carefully  review  the  rationale  behind  the 
application  of  the  EM  algorithm  in  the  latent  class  model  context,  in 
order  to  lay  the  groundwork  for  the  extension  of  the  algorithm  to  cover 
monotone  homogeneity  constraints  on  the  item  parameters.  The  rationale 
will  be  developed  in  the  next  section  and  extended  to  monotone ly 
homogeneous  items  in  the  section  after  that. 

APPLICATION  OF  THE  EM  ALGORITHM  TO 
THE  LATENT  CLASS  MODEL 

Estimation  of  parameters  in  the  latent  class  model  would  be  easy  if 
we  knew  the  state  of  each  subject.  The  maximum  likelihood  estimates  of 
the  distribution  of  subjects  over  states  would  just  be  the  sample 
proportions  falling  in  the  respective  states.  The  estimates  of 
conditional  response  probabilities  to  items,  given  state,  would  be  the 
corresponding  sample  proportions  of  item  responses. 

The  missing  data  about  the  states  of  the  respective  subjects  turns  an 
easy  problem  into  a  hard  one.  Problems  with  this  general  character,  which 
would  be  manageable  if  only  some  crucial  information  were  not  missing, 
occur  in  many  contexts.  They  have  inspired  numerous  special  algorithms, 
often  of  the  following  form: 


1.  Make  an  initial  guess  at  the  parameter  values. 


2.  Using  this  guess,  make  an  informed  guess  regarding  the  missing 
data. 

3.  Using  this  informed  guess  in  place  of  the  missing  data,  apply 
the  procedure  you  would  ordinarily  use  to  estimate  parameter 
values. 

4.  Replace  the  initial  guess  at  the  parameter  values  with  the 
latter  estimates  and  repeat  the  process  until  the  parameter 
estimates  in  steps  1  and  3  no  longer  differ  significantly. 

Dempster,  Laird,  and  Rubin  (1977)  synthesized  these  many  special 

algorithms  into  a  general  approach,  which  they  call  the  EM  algorithm, 

and  showed  that  under  fairly  general  conditions,  if  maximum  likelihood 

procedures  are  used  at  each  iteration  in  Steps  2  and  3  above,  the 

algorithm  converges  to  marginal  maximum  likelihood  estimates. 

In  describing  how  this  process  works  in  the  case  of  the  latent 
class  model,  let  us  use  the  following  notation. 


1  if  subject  i  is  correct  on  item  j 

X  •  . = 

1J  0  if  subject  i  is  incorrect  on  item  j; 
n  =  the  number  of  subjects; 

J  =  the  number  of  items; 

=  the  probability  of  a  subject  being  in  class  or  state  k; 


s  =  the  number  of  latent  classes  or  states; 

jv  =  (vi»..*vs)»  the  vector  of  state  probabilities; 

.x.  =  the  vector  of  responses  of  subject  i  to  items  j  =  1,  ...  J; 

pkj  =  the  cond’tional  probability  that  a  subject  in  state  k  wi 1 1 
respond  correctly  to  item  j; 

P  =  (P|<j)»  the  states-by-items  matrix  of  conditional  response 


probabilities; 
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=  unit  vector  with  1  in  k —  coordinate  and  0's  everywhere  else, 
k=l, . . . ,s; 

=  the  unit  vector  e^  corresponding  to  the  state  of  subject  i. 

I 

Note  that  e^z-  is  1  if  subject  i  is  in  state  k,  and  is  0  otherwise. 

Let  us  denote  the  conditional  probability  of  obtaining  response 
vector  ,  given  that  subject  i  is  in  state  k,  by 


Vii>  =  'k(ii|P- 


J 

n 

j=i 


ij 


(1-P 


kj 


(1) 


The  likelihood  of  the  "complete"  data,  that  is,  the  joint 
likelihood  of  responses  and  missing  state  membership  vectors  is 


given  by 

L ( x^  » 


,x 


— n’  ±1' 


n  s 
=  n  n 

i=l  k=l 


±i 

[v  1  (X  )] 
k  k  i 


Let  Ik  be  the  set  of  indices  for  subjects  in  state  k,  i.e.  those 

l 

for  whom  e^  =  1.  Then  the  likelihood  of  the  complete  data  can  be 
rewritten  as 

L ( x.i < •  •  ;  Ii » •  •  • .  in|p*  l) 


A 


■  H. 

1  el, 


(2) 
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s  n.  s  J  r,  . 

k  ^  (  rr  tt  -  k  J 


=  (  n  v  K)(  n  n  PkiKJ(l-Pki)"k  r,<j) 
k=l  K  k=l  j=l 


(3) 


where  denotes  the  number  of  subjects  in  state  k  and 


kj 


'  *  L>- 


denotes  the  number  of  correct  responses  to  item  j  from  subjects  in 
state  k.  Thus,  the  likelihood  of  the  complete  data  is  an  exponential 
family  and  the  set  of  r^'s  and  n^'s  are  sufficient  statistics  for  the 
likelihood. 


The  likelihood  function  for  the  complete  data  is  to  be  contrasted 
with  the  marginal  likelihood  function  of  the  data  actually  observed. 
The  marginal  likelihood  of  the  response  vector  for  a  given  subject 


is  the  average  over  states  of  the  conditional  probabilities  of  the 
response  vector,  given  the  states, 


1*(X,)  *  l*(x, 


k=l 


(4) 


The  marginal  likelihood  of  all  the  observed  response  vectors  is  given 
by 

L*(xi,..,xn|p,v) 

=  E2{L(x1,..,xf),z1,..,zn|P,v)  1 


n 


=  n  i*(x  ). 


(5) 
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It  is  possible  to  attack  the  maximization  of  Equation  5  directly, 
but  doing  so  leads  to  a  cumbersome  system  of  nonlinear  equations. 
Another  approach  suggested  by  the  relationship  between  the  likelihood 
function  for  the  complete  data  and  the  marginal  likelihood  can  be  used 


to  maximize  the  latter  indirectly. 

Taking  logarithms  of  the  likelihood  of  the  complete  data  in 
equation  3  yields 


s  s  J 

log  L  =  7  n.  log  v  +  T  l  [r.  .log  p,  .  + 
k=l  K  K  k=l  j=l 


(nk-rkj)l°g(1-Pkj)].  (6) 


If  we  knew  the  state  of  each  subject,  we  could  count  the  n^'s  and 
r^j's  and  the  standard  ratios  of  these  frequencies  would  be  seen  to 
be  the  maximum  likelihood  estimators  of  the  v^'s  and  Pkj's. 

Suppose  we  symbolically  calculate  the  conditional  expectation  of 
Equation  6,  given  the  observable  response  vectors  X| ,...  ,£n  and 
trial  values  of  the  parameters,  P0  and  _v0: 


EllogKxj,...,^  .Zj,. 


=  ?  E  (n.)log 
k=l  0  K 


+  y 

k=l 


**n’Po’^o 


(7) 


J 

reoCrkj>,09  pkj  *  Eo<Vrkj)l09  <1-Pkj>l* 

J  ^ 


where  Eq(.)  denotes  the  conditonal  expectation,  given  the  responses  and 
trial  parameter  values,  E(. lx,  ,..,x  ,P.v  ). 

Let  v^o  denote  the  ki!l  coordinate  of  the  trial  state  probability 
vector^  and  let  vfc.  denote  the  conditional  probability  that  subject  i 
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is  in  state  k,  given  the  subject's  responses,  _x j ,  and  the  trial 
parameter  values  P0  and  v0.  By  Bayes'  theorem. 


(8) 


The  conditional  expectations  £0(nj<)  and  Eo(rkj)  can 
easily  computed  in  terms  of  the  v^-j's.  Recall  that  the  scalar 
product  of  the  state  vector  for  subject  i,  £i,  and  the  unit  vector 
corresponding  to  state  k,  is  1  if  subject  i  is  in  state  k  and  0, 
otherwise.  Thus, 

\  -  jjsU, 

and 


rkj  -  J/ijikir 

The  expected  number  of  subjects  in  state  k,  given  the  responses 
and  trial  values  of  the  parameters,  is 
n 


(9) 


* 

i=l  K1 


A 
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The  expected  number  of  correct  responses  to  item  j  from  subjects 
in  state  k,  given  the  responses  and  trial  parameter  values,  is 


"  l 

E  (r,  .)  =  E(  ^  x  .e,  z  lx, , . .  ,x  ;  P  ,v  ) 
ov  kj'  i j — k — 1 1 — 1  ’  n’  o  — o' 

n 

=  T  x.P(z=e,|x-,..,x  ;  P  ,v  ) 
i  j  — l  — k |—i  ’  ’  ’  o’— o' 


(10) 


n 

=  ^  X  .  v 

(=1  ’J  k’ 

Equations  8,  9  and  10  enable  us  to  compute  numbers  to  use  in 

Equation  7.  Note  that  the  trial  values  of  the  parameters  P  and  v 

o  — o 

used  to  computer  the  EQ(nk)'s  and  Eq(P^)'s  are  distinct  from 

parameters  P  and  _v  which  are  free  variables  in  the  likelihood  functions 
given  in  Equations  5,  6  and  7. 

It  is  often  relatively  easy  to  maximize  Equation  7.  If  the 
resulting  parameter  estimates  differ  from  the  trial  values  P0  and 
_v0,  they  will  yield  higher  values  of  the  marginal  likelihood  function 
L*  than  P0  and  _v0,  though  they  may  not  maximize  L*.  If  they  do  not 
differ  from  P0  and  y_0,  then  P0  and  _«o  dre  d^so  solutions  to  the 
marginal  likelihood  equations  which  result  from  setting  the  partial 
derivatives  of  log  L*  equal  to  zero.  This  fact  is  established  by 
Dempster,  et .  al  (1977)  for  problems  in  which  the  likelihood  of  the 
complete  data  is  an  exponential  family,  as  is  the  case  here.  Sometimes 
there  are  multiple  possible  solutions  to  the  marginal  maximum 
likelihood  equations  and  the  question  arises  whether  a  given  solution 
is  the  global  maximum  of  the  likelihood  function.  Ways  of  dealing  with 
this  problem  will  be  discussed  later  in  this  section. 
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Finding  values  of  P  and  _v  to  maximize  Equation  7  breaks  down 
conveniently  into  two  subproblems:  maximization  of 

Lv  ■  J1Eo(nk),°9  vk  (11) 

with  respect  to  the  vector  of  state  probabilities,  _v»  and  maximization 
of 

Lp  =  jj  Jj^kJ1'09  V  Eo<Vrkj>'09  (1-pkj>'  1121 

with  respect  to  the  matrix  of  conditional  response  probabilities,  P. 

If  no  contraints  are  placed  on  the  parameters,  the  solution  to  the 
first  problem  is  given  by 
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Let  0  represent  a  generic  item  parameter,  possibly  affecting 
several  of  the  Pkj's.  In  maximizing  the  part  of  Equation  7  vrttich 
depends  on  the  free  item  parameters,  we  set  the  partial  derivative  of 
Equation  11  with  respect  to  0  equal  to  zero;  the  resulting  equation  can 
be  arranged  to  read 


J  s 

n  T  y 

k=l  k=l 


"kj(1-pkj 


ftpki 

kJ  =  0. 


Equality  and  complementarity  constraints 

In  general.  Equation  15  leads  to  a  system  of  nonlinear  equations 
which  can  be  very  difficult  to  solve.  However,  there  are  some  special 
cases  which  are  easy  to  handle.  For  example,  if  there  are  no 
constraints  on  the  p|<j's,  then  each  p^j  is  a  distinct  parameter 
affecting  only  one  term  in  the  sum  in  Equation  15.  The  partial 
derivative  with  respect  to  p^j  itself  is  1,  all  the  other  oartials 
are  0,  and  we  obtain  p^  as  the  solution. 

More  generally,  solution  is  easy  if  we  only  wish  to  impose 
equality  or  complementarity  constraints,  so  that  we  require  p^j  =  0 
for  one  set  of  Pkj's,  Pkj  =  1*^  for  another  set,  and  no  p^j 
outside  of  these  sets  depends  on  o.  Then  PkjU-Pkj)  equals  0(1-°), 
independent  of  subscript,  for  all  j,k  such  that  the  partial  derivative 
ftpkj/do  is  nonzero.  The  partial  derivative  is  1  for  p^j's  equal  to 
0  and  -1  for  those  equal  to  1-0.  Let  In  be  the  <et  of  indices  j,k 
for  which  j  equals  0  and  1^  the  set  for  vrtiich  py  is  1-0.  Then 
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Equation  14  reduces  to  a  linear  equation  whose  solution  is  the 
following  weighted  combination  of  ^ * s : 


0  = 


y  vkPkj  +  y  vk(1~pkj) 
J,kel0  KJ  0,keIsK  KJ 

y.  vi!  +  y  V* 

j.kel*  j.kel^ 


(16) 


The  application  of  the  EM  algorithm  to  estimation  of  parameters  in 
the  latent  class  model  with  equality  and  complementarity  constraints 
can  be  summarized  as  follows. 

1.  Select  trial  values  of  the  parameters  P0  and  ^0. 

2.  Compute  conditional  state  probabilities  for  all  subjects, 
using  Equation  8. 

3.  Revise  the  parameter  estimates  of  v  via  Equation  13  and  the 
estimates  of  P  via  Equations  14  antT  16. 

4.  Repeat  Steps  1  through  3,  using  the  revised  estimates  as  new 
trial  values,  until  the  trial  values  and  the  revised  values  no 
longer  differ  signif icantly. 

The  key  computations  in  this  algorithm  involve  ratios  of  counts  or 
estimates  of  counts  in  which  the  denominators  are  always  at  least  as 
big  as  the  numerators.  The  constraint  that  all  estimates  lie  in  the 
unit  interval  is  therefore  automatically  satisfied.  This  is  a 
significant  feature  of  the  EM  approach  not  shared  by  the  Newton-Raphson 
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algorithm  when  applied  to  the  marginal  likelihood  function  in  Equation 
5. 

The  most  significant  problem  which  this  algorithm  is  likely  to 
encounter  in  practice  is  one  that  it  shares  with  all  existing 
algorithms  that  would  be  practical  to  use  on  latent  class  model 
problems.  It  was  noted  earlier  in  the  paper  that  the  maximum 
likelihood  equations  can  have  multiple  solutions.  In  problems  where 
there  are  multiple  solutions,  any  iterative  algorithm  will  tend  to  go 
to  a  solution  close  to  the  trial  values  initially  selected.  The 
resulting  solution  may  well  not  be  the  parameter  values  that  truly 
maximize  the  likelihood,  particularly  if  the  starting  values  are 
selected  arbitrarily.  It  is  therefore  a  good  idea  to  try  a  variety  of 
plausible  sets  of  starting  values. 

Goodman  (1974)  gives  an  algorithm  for  estimation  of  parameters  in 
complex  contingency  tables  where  some  of  the  variables  are  not 
observable.  The  specialization  of  his  algorithm  to  the  case  of 
dichotomous  responses  is  essentially  equivalent  to  the  algorithm  given 
here.  Since  it  is  intended  for  analysis  of  contingency  tables,  it 
assumes  that  the  joint  response  data  for  the  subjects  is  summarized  in 
that  form.  Latent  class  model  estimation  programs  implementing 
Goodman's  algorithm,  such  as  Clogg  and  Sawyer  (1981),  are  limited  in 
terms  of  the  number  of  items  which  they  can  accomodate,  because  the 
multi-item  contingency  table  quickly  becomes  unmanageable  as  the  number 
of  items  increases.  The  form  of  the  algorithm  given  in  this  paper 
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deals  with  each  individual  response  vector,  rather  than  cell  counts  in 
a  contingency  table.  Hence,  the  effect  of  increasing  the  number  of 
items  has  no  effect  on  the  algorithm  beyond  the  increase  in  running 
time,  which  is  directly  proportional  to  the  number  of  items.  Actually, 
the  effect  on  running  time  is  more  closely  proportional  to  the  square 
of  the  number  of  items  in  applications,  such  as  scaling,  in  which  the 
number  of  states  in  the  model  is  also  proportional  to  the  number  of 
items.  Nevertheless,  the  effect  is  much  more  manageable  than  the 
exponential  increase  in  the  number  of  cells  in  the  contingency  table 
with  which  an  algorithm  for  analysis  of  contingency  tables  must  deal. 

The  EM  algorithm  in  the  form  presented  in  this  paper  can  cope  with 
tests  comprised  of  many  items,  while  automatically  satisfying  the 
fundamental  constraint  that  the  parameter  estimates  all  fall  in  the 
unit  interval  and  any  further  equality  and  complementarity  constraints 
the  investigator  may  wish  to  impose  on  the  parameters.  This  fact, 
together  with  computational  simplicity  at  each  iteration,  makes  the 
algorithm  an  attractive  alternative  to  other  approaches  to  the 
calculation  of  the  maximum  likelihood  estimates  for  the  latent  class 
model.  Two  questions  arise:  one  about  how  many  models  of  interest  can 
be  formulated  using  only  equality  and  complementarity  constraints,  and 
a  second  one  about  the  possibility  that  there  are  other  special  kinds 
of  constraints  which  would  also  yield  easy  solutions  at  each  iteration 
of  the  algorithm. 

The  answer  to  the  first  question  is  that  many  latent  class  models 
of  interest  can  be  expressed  in  terms  of  equality  and  complementarity 
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constraints  on  the  parameters,  including  most  of  the  models  which  have 
been  proposed  to  date.  The  latent  distance  model  of  Lazarsfeld  and 
Henry  (1968)  and  the  quasi -independence  model  of  Goodman  (1975),  both 
of  which  are  generalizations  of  the  Guttman  simplex  model  for  scaling 
response  patterns,  fall  in  this  category.  Dayton  and  Macreaay  (1976, 
1980)  have  proposed  anologs  and  extensions  of  these  models  for 
applications  in  the  analysis  of  learning  hierarchies;  their  extensions 
can  also  be  expressed  in  terms  of  equality  and  complementarity 
constraints.  Paulson  (1985)  has  proposed  models  for  signed-number 
addition  test  performance  with  one  latent  class  for  students  who  have 
mastered  the  concept  and  other  latent  classes  corresponding  to  classes 
of  subjects  exhibiting  certain  systematic  patterns  of  errors.  These 
models  are  not  scaling  models,  but  they  are  expressable  in  terms  of 
equality  and  complementarity  constraints  on  the  parameters. 

If  only  equality  and  complementarity  conditions  constrain  the  item 
parameters,  then  each  p^j  is  influenced  by  exactly  one  parameter. 

This  rules  out  models  which  characterize  each  p^j  in  terms  of 
conjoint  effects  of  item  and  state  parameters,  as  the  Rasch  model  does, 
for  example.  It  also  rules  out  models  that  impose  ordering  constraints 
on  the  Pkj's.  Thus,  vrtiile  many  interesting  models  can  be  cast  in 
terms  of  equality  and  complementarity  constraints,  many  others  cannot. 
Fortunately,  models  involving  conjoint  item  and  state  effects  and 
models  imposing  ordering  constraints  can  be  formulated  vrfiich  lead  to 
easily  solved  forms  of  Equation  15,  preserving  the  computational 
simplicity  of  the  EM  algorithm. 
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EXTENSION  TO  MONOTONELY  HOMOGENEOUS  ITEMS 
A  set  of  items  is  said  to  be  monotonely  homogeneous  if  the 
probabilities  of  correct  response  in  different  subject  states  fall  in 
the  same  order  for  all  items.  That  is,  for  every  pair  of  items  j,  j* 
and  every  pair  of  states  k,  k* 


pkj>P 


k  j 


=> 


Pkj‘>Pkj 


(16) 


Any  set  of  items  conforming  to  a  unidimensional  item  response  theory 
which  requires  the  probabi 1 ities  of  correct  response  to  items  be 
monotonical ly  increasing  functions  of  ability  is  monotonely 
homogeneous.  All  standard  item  response  theory  models  impose  this 
condition.  On  the  other  hand,  if  a  set  of  items  is  monotonely 
homogeneous,  then  the  averages  of  the  conditional  probabilities  of 
correct  response  over  all  the  items,  given  the  respective  states,  must 
fall  in  the  same  order  as  the  conditional  probabilities  for  individual 
items.  Let  us  define  ability  level  for  subjects  in  a  given  state  to  be 
the  average  of  the  conditional  probabilities  of  correct  response  over 
all  the  items,  i.e.  the  "true  proportion  correct”.  Consider  the 
function  associated  with  each  item  which  is  obtained  when  one  plots  the 
conditional  probability  of  correct  response  to  the  item,  given  state, 
versus  true  proportion  correct.  This  function  is  necessarily 
monotonical ly  increasing  for  every  item.  That  is,  any  monotonely 
homogeneous  set  of  items  is  associated  with  a  corresponding  set  of 
monotonical ly  increasing  item  response  functions.  Thus,  monotone 
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homogeneity  of  a  set  of  items  is  a  necessary  and  sufficient  condition 
for  the  items  to  be  representable  by  an  item  response  theory  model  with 
monotonical ly  increasing  item  response  functions. 

The  assumption  of  monotone  homogeneity  is  of  interest  from  a 
couple  of  different  perspectives.  Since  it  is  the  minimal  assumption 
concerning  the  form  of  the  item  response  function  sufficient  to  yield  a 
model  with  monotonical ly  increasing  tracelines,  it  is  worth  considering 
how  far  the  theory  can  be  developed  with  no  further  assumptions 
regarding  the  form  of  the  functions.  Mokken  (1971),  who  first 
emphasized  the  importance  of  the  assumption,  Mokken  and  Lewis  (1982), 
and  Lewis  (1985)  have  pursued  this  idea  in  developing  a  nonpar ametric 
approach  to  item  response  theory.  A  fundamental  problem  in  this 
development  is  the  estimation  of  item  response  functions.  In  this 
section  we  show  how  to  obtain  marginal  maximum  likelihood  estimates  for 
these  functions  in  models  restricted  to  a  finite  number  of  states.  The 
restriction  to  a  finite  number  of  ability  states  would  seem  to  be  an 
extreme  limitation  on  the  lue  of  such  an  approach,  but  Bock  and 
Aitkin  (1981)  have  shown  that  it  is  quite  workable  in  application  to 
standard  item  response  theory  models. 

From  another  point  of  view,  the  assumption  of  monotone  homogeneity 
is  interesting  because  it  provides  a  definitive  criterion  for  deciding 
that  a  unidimensional  representation  of  responses  to  a  given  set  of 
items  is  inappropriate.  Holland  (1981)  has  derived  from  the  assumption 
a  series  of  necessary  conditions  observed  data  must  satisfy  in  order  to 
be  capable  of  representation  by  a  unidimensional  item  response  theory 


model.  Unlike  the  assumption  itself,  these  conditions  can  be  tested 
without  estimating  the  item  parameters.  The  simplest  of  the  conditions 
is  that  interitem  correlations  must  be  nonnegative.  Paulson  (1985)  has 
shown  that  this  condition  is  violated  in  an  analysis  of  signed-number 
addition  test  data  from  a  study  by  Tatsuoka  and  Birenbaum  (1979). 
Paulson  describes  a  simple  latent  class  model  which  does  give  a  good 
account  of  this  data.  This  model  is  not  a  scaling  model:  the  states 
in  the  model  correspond  either  to  mastery  of  the  concept  or  to  one  of  a 
set  of  systematic  misconceptions  students  fall  into  regarding  the 
concept.  The  latter  states  are  not  ordered.  The  nonnegativity  of 
interitem  correlations  is  a  simple  but  weak  criterion  for  testing 
monotone  homogeneity.  Holland  (1981)  gives  more  stringent  tests  in 
terms  of  nonnegativity  of  correlations  between  indices  based  on 
Lombmed  item  responses.  We  will  describe  a  more  direct  approach  later 
in  this  section  -  the  likelihood  ratio  test  of  the  goodness  of  fit  of 
'be  monotonely  homogeneous  finite-state  model  compared  to  the  fit  of 
the  corresponding  latent  class  model  without  the  monotone  homogeneity 
■.onstrdint . 

Modification  of  the  Algorithm  to  Provide  Monotone  Homogenity 

Recall  that  at  each  iteration  of  the  EM  algorithm,  the  problem  of 
maximizing  the  conditional  likelihood,  given  the  responses  and  trial 
values  of  the  parameters,  reduces  to  maximization  of  two  separate 
terms,  one  depending  only  on  the  state  probability  distribution 
parameters  and  the  other  only  on  item  parameters,  i.e.  parameters 
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affecting  conditional  response  probabilities,  given  the  subject's 
state.  These  terms  were  given  above  in  Equations  10  and  11. 

When  the  item  parameters  are  unconstrained,  each  term  in  the  sum 
in  Equation  11  can  be  maximized  separately.  If  the  parameters  are 
constrained,  but  the  constraints  apply  separately  to  each  item,  then 
the  set  of  terms  involving  each  item  can  be  maximized  separately. 
Monotone  homogeneity  constraints  are  of  this  type:  they  specify  the 
ordering  of  the  conditional  correct  response  probabilities  to  a 
particular  item,  given  the  respective  states,  but  say  nothing  about 
relationships  between  response  probabilities  involving  different  items. 

Thus,  the  maximization  of  Equation  11  can  be  written  as 


J  s 

max  Lp  =  I  max^nE0(rkj)log  pkj  +  E0(nk-rkJ ) log( l-pkJ )  1 .  (17) 

J  i 


k=l 


Each  of  the  maximizations  on  the  right  hand  side  of  Equation  17 
corresponds  to  the  maximium  likelihood  equation  for  estimating  the 
success  probabilities  in  s^  independent  groups  for  a  particular  item. 
Carrying  out  the  maximization  under  ordering  constraints  has  a  known 
solution  which  bears  an  interest  relation  to  the  algorithm  given  above 
for  dealing  with  equality  constraints. 

Consider  the  problem  of  maximum  likelihood  estimation  of 
proportions  in  s  independent  groups.  Its  solution  is  the  familiar 


pk 


for  k=l,...s  . 
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In  our  problem,  r^  and  n^  are  replaced  by  EoCr^j)  and 
E0(nj<),  their  conditional  expectations  for  item  j,  given  the 
observed  responses  and  trial  values  of  the  parameters. 

Now  let  us  add  the  constraint  that 

Pl  <  p2  <  ...<  p$  . 

Barlow,  et  al .  (1972)  have  shown  how  to  treat  this  problem  in  terms  of 

isotonic  regression.  The  solution  is  built  upon  the  unconstrained 

maximum  likelihood  estimators  just  mentioned,  which  are  referred  to  by 

Barlow,  et  al.  as  basic  estimates.  These  basic  estimates  are 

amalgamated  for  solution  blocks  of  adjacent  groups  within  which  each 
group's  estimate  is  set  equal  to  the  weighted  average  of  the  pk's  for 

the  groups  comprising  the  solution  block. 

The  solution  blocks  are  formed  as  follows.  At  first  each  group 

forms  its  own  block.  If  the  basic  estimates  for  all  the  groups  fall  in 

the  right  order,  then  the  ordering  constraint  is  not  active  and  the 

constrained  estimates  coincide  with  the  basic  estimates.  A  group  will 

continue  to  form  its  own  solution  block  unless  one  or  both  of  the 

following  conditions  hold: 

a)  its  inclusion  with  the  group  or  adjacent  set  of  groups 
immediately  above  it  in  the  hypothesized  order  would  increase 
the  average  for  the  resulting  block;  or 

b)  its  inclusion  with  the  group  or  adjacent  set  of  groups 
immediately  below  it  in  the  hypothesized  order  would  decrease 
the  average  for  the  resulting  block. 
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The  existence  of  either  condition  implies  a  violation  of  the  ordering 
constraint  which  can  be  remedied  by  combining  the  groups  involved  and 
setting  the  estimate  of  probability  correct  in  each  of  these  groups 
equal  to  the  weighted  average  of  their  basic  estimates. 

Let  the  weighted  average  of  the  basic  estimates  in  the  adjacent 
set  of  groups  with  indices  running  from  _t  through  _u  be  denoted  by 


Av(t_,  u)  =  y  E0(nk)Pk 


k=t 


y  Eo(nk} 
k=t  0 


(18] 


The  constrained  maximum  likelihood  estimates  can  be  expressed  in 
terms  of  "max-min"  formulas  in  four  different  but  equivalent  ways: 


p\  =  max  min  Av(t^  u) 
t<k  u> k 

=  min  max  Av(_t,  _u) 

u>k  t<k 

=  max  min  Av(_t,  .u) 

t<k  u>t 

=  min  max  Av(t^,  u^) . 

u>k  t<u 


(19) 


The  result  given  by  Equations  18  and  19  is  what  one  would  obtain 
using  Equation  16  to  impose  the  constraint  that  conditional 
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probabilities  of  correct  response  in  states  belonging  to  the  same 
solution  block  must  be  equal.  The  main  difference  between  the 
algorithm  to  impose  simple  equality  constraints  and  the  algorithm 
necessary  to  provide  monotone  homogeneity  is  that  the  solution  blocks 
and  the  equality  constraints  implicit  in  them  are  not  given  beforehand 
and  can  change  from  one  iteration  to  the  next.  The  latter  algorithm 
must  take  this  into  account. 

The  Up-and-Down  Blocks  Algorithm.  There  are  many  ways  one  can 
determine  the  solution  blocks  needed  to  satisfy  Equation  19.  Barlow  et 
al.  (1972)  recommend  a  procedure  due  to  Kruskal  (1964),  called  the 
"Up-and-Down  Blocks"  algorithm.  Key  terms  in  the  tests  used  in  the 
algorithm  are  defined  as  follows.  Let  B_,  B,  B+  be  three 
consectuive  blocks  in  order.  Block  B  is  said  to  be  up-satisfied  if 
Av  B  <  Av  B+.  It  is  said  to  be  down-satisfied  if  Av  B.  <  Av  B.  At 
each  stage  of  the  algorithm  one  block  is  active;  this  may  be 
amalgamated  with  an  adjacent  block  or,  if  it  is  up-satisfied  and 
down-satisfied,  the  next  block  become  active.  By  convention,  the  first 
block  in  order  is  down-satisfied  and  the  last  block  is  up-satisfied. 

The  exact  sequence  of  events  is  as  follows. 

1.  At  the  start,  each  state  is  a  separate  solution  block.  State 
1  is  initially  specified  to  be  the  active  block. 

2.  Test  to  see  if  the  active  block  is  up-satisfied.  If  it  is,  go 
to  the  next  step.  If  it  is  not,  pool  the  active  block  with 
the  next  higher  block;  the  new  block  becomes  active.  Go  to 
Step  3. 
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3.  Test  to  see  if  the  active  block  is  down-satisfied.  If  it  is, 
go  to  Step  4.  If  it  is  not,  pool  the  active  block  with  the 
next  lower  block;  the  new  block  becomes  active.  Go  back  to 
Step  2. 

4.  If  the  active  block  does  not  contain  the  highest  state,  make 
the  next  higher  block  active  and  go  back  to  Step  2.  If  the 
active  block  contains  the  highest  state,  the  algorithm  is 
finished. 

The  sequence  of  tests  and  actions  to  determine  the  solution  blocks 
is  given  for  a  hypothetical  example  in  Table  1.  In  the  example,  there 
are  five  groups  with  equal  sample  sizes,  so  that  unweighted  averages 
are  used.  For  the  groups  in  their  hypothesized  order,  the  basic 
estimates  are  .50,  .60,  .70,  .40,  and  .90,  respectively.  When  the 
algorithm  encounters  the  violation  of  monotone  homogeneity  in  comparing 
the  third  and  fourth  groups,  adjustments  are  made  resulting  in  the 
final  estimates  .50,  .57,  .57,  .57,  .90. 

Insert  Table  1  about  here 

In  summary,  monotone  homogeneity  of  items  is  provided  by  modifying 
the  EM  algorithm  for  unconstrai ned  marginal  maximum  likelihood 
estimation  as  follows.  At  each  iteration,  compute  the  unconstrained 
estimates  and  then  apply  the  Up-and-Down  Blocks  algorithm  to  the 
results  for  each  item.  Use  these  monotonely  homogeneous  values  as 
trial  values  on  the  next  iteration.  Iterate  until  the  stopping 
criterion  you  are  using  is  satisfied. 
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A  Test  for  Monotone  Homogeneity 

When  there  are  J  items  on  a  test  and  one  is  fitting  an 
unconstrained  latent  class  model  with  s  states,  there  are  Js  free  item 
parameters  to  be  estimated.  Let  m.  denote  the  number  of  level  sets 

J 

determined  by  the  Up-and-Down  Blocks  algorithm  for  item  j.  The  number 

of  free  item  parameters  in  the  model  with  the  monotone 

homogeneity  constraint  is  then  J  m..  Let  L  and  L„  denote  the  maxima 
3  J  j  u  m 

of  the  marginal  likelihood  function  evaluated  under  the  unconstrained 

and  monotonely  constrained  hypotheses,  respectively.  If  the  monotone 

homogeneity  hypothesis  is  correct,  then  asymptotically  the  likelihood 

ratio  test  statistic 


-2  log  X  =  2(log  Lu-  log  Lm)  (20) 

has  a  chi -squared  distribution  with  Js  -  T  mj  degrees  of  freedom. 

This  fact  can  be  used  to  set  up  critical  regions  for  tests  of  the 
hypothesis. 

Example.  Figure  1  gives  graphs  of  item  response  functions  for 
some  signed-number  addition  test  data  obtained  by  Tatsuoka  and 
Birenbaum  (1979).  Five  pairs  of  response  functions  are  depicted  -  one 
pair  for  each  of  five  types  of  items  on  the  test.  Each  pair  consists 
of  an  unconstrained  item  response  function  and  a  function  constrained 
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to  be  monotonical ly  homogeneous.  The  analysis  refers  to  a  special 
scoring  of  responses  which  only  attends  to  whether  the  magnitude  of  the 
response  is  correct,  disregarding  the  sign  of  the  answer.  The  curves 
given  are  actually  averages  of  four  separate  curves,  because  there  were 
four  items  of  each  type.  Within  types,  the  curves  are  practically 
identical.  The  types  vary  in  terms  of  whether  the  larger  of  the 
addends  appears  first  or  second  in  the  sum,  and  in  terms  of  the  signs 
of  the  addends.  An  item  such  as  "10+-5"  would  be  of  the  type 
designated  L+-S  on  Figure  1,  for  example. 

Tatsuoka  and  Birenbaum  found  that  if  one  examines  the  magnitude  of 
the  responses  and  the  sign  of  the  responses  to  these  items  separately, 
some  very  interesting  patterns  emerge.  Some  groups  of  subjects  fall 
into  systematic  patterns  of  errors  and  correct  response  which 
correspond  to  use  of  erroneous  rules.  Paulson  (1985)  found  that  a 
five-state  latent  class  model  would  give  a  good  account  of  the 
magnitude  responses.  That  is  why  five-state  models  were  used  to  obtain 
the  curves  in  Figure  1.  Examination  of  the  figures  reveals  that  the 
unconstrained  and  monotonical ly  homogeneous  curves  are  very  similar  for 
four  of  the  five  item  types.  However,  for  the  type  -L+-S,  the 
unconstrained  curve  is  practically  "U"-shaped.  On  the  basis  of  these 
curves,  we  would  expect  to  reject  the  hypothesis  of  monotone 
homogeneity.  Since  there  are  20  items  on  the  test  and  five  states  in 
the  model,  the  unconstrained  model  has  100  free  item  parameters.  It 
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turns  out  that  the  total  number  of  level  sets  in  the  monotonical ly 
constrained  model  is  81.  Thus,  there  are  19  degrees  of  freedom  for  the 
chi -squared  test.  We  do  in  fact  reject  the  null  hypothesis: 
X2(19)=82.30,  p<.0001. 

Further  insight  can  be  obtained  by  examining  the  data  in  Figure  1 
from  another  perspective.  Figure  2  shows  the  profiles  of  responses  to 
the  different  types  of  items  for  subjects  in  each  of  the  five  states. 
Subjects  in  State  4,  the  next  to  the  highest  state  in  terms  of  number 
correct,  do  well  on  all  item  types,  except  Type  -L+-S.  Subjects  in  the 
lowest  state  in  terms  of  number  correct.  State  1,  do  well  on  Type 
-L+-S,  but  poorly  on  all  the  rest.  Type  -L+-S  is  the  only  type  on  the 
test  for  which  one  should  add  absolute  values  of  the  addends;  one 
should  subtract  on  all  the  rest.  Subjects  in  State  1  appear  to  follow 
the  rule,  "Always  add,"  whereas  subjects  in  State  4  appear  to  follow 
the  rule,  "Always  subtract."  Clearly,  clusters  of  subjects  following 
erroneous  rules  of  this  sort  can  lead  to  violations  of  monotone 
homogeneity. 

SUMMARY 

This  paper  has  reviewed  the  application  of  the  EM  algorithm  to 
parameter  estimation  in  the  latent  class  model  and  shown  how  it  can  be 
used  to  extend  existing  algorithms  to  cover  monotone  homogeneity 
constraints  on  the  item  parameters.  The  assumption  of  monotone 
homogeneity  is  interesting  from  a  couple  of  perspectives.  Items  on  a 
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test  have  monotonical ly  increasing  item  response  functions  if  and  only 
if  they  are  monotone ly  homogeneous,  so  the  assumption  leads  to  a 
minimally  restrictive  form  of  item  response  theory.  If  the  assumption 
is  violated,  a  unidimensional  item  response  theory  is  clearly 
inappropriate  for  the  data  in  question.  The  paper  has  shown  that,  if 
we  restrict  ourselves  to  finite-state  latent-class  models,  we  can  use 
the  EM  algorithm  to  obtain  marginal  maximum  likelihood  estimates  of  the 
item  response  functions  under  the  minimal  monotone  homogeneity 
assumption.  These  "nonparametric"  estimates  should  be  very  useful  when 
the  assumption  holds.  On  the  other  hand,  if  the  assumption  does  not 
hold,  we  would  certainly  want  to  know  about  it.  With  the  marginal 
maximum  likelihood  estimates  in  hand  for  both  the  monotonely 
homogeneous  latent  class  model  and  the  unconstrained  model  with  the 
same  number  of  states,  we  can  calculate  a  direct  likelihood  ratio  test 
of  the  monotone  homogeneity  hypothesis. 
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Table  1.  Illustration  of  the  "Up-and-Down  Blocks"  Algorithm. 

Each  line  indicates  the  outcomes  of  test  made  on  an  active 
block.  The  current  estimate  of  the  correct  response 
probabilities  for  groups  comprising  the  active  block  are 
underlined  at  the  left.  The  action  taken  is  given  at  the  right. 
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